This has both the fill in blank and completed scripts which can be used for the presentation

This is going to be a basic overview of making your some basic plots in ggplot. We will cover

You should think of this as a “Best hits of intro to ggplot”. I have gone through and collected material that I have found to be the most helpful when learning ggplot. All of these will linked as we go through.If you want any more information on a particular topic, those would be great places to start.

First step will be to load all the libaries we might need. Make sure these are installed (if you don’t know how to install packages look here)

It’s always a good idea to load tidyverse, that way if you need to clean up any data before plotting you’ll be good to go. here is a helpful package for loading and saving files. You may not need to use it here. You need to download ggplot, I don’t think we need to explain why. Now, patchwork is very exciting and I will show you exactly what it does later.

library(tidyverse)
library(here)
library(ggplot2)
library(patchwork)

A few notes on ggplot

Basic Anatomy of a ggplot

p=ggplot(aes(aes1,aes2))+ #these are global aesthetics that will apply to all the points (required)
    geom_X(aes(aes1,aes2))+ #X=point|bar|violin|etc, you can have many `geom`s in one plot (required)
    theme() # a lot of your specifications will go here (not required)
  
p #this is how you get your plot to show up 

You could also just get the plot to show up automatically if you don’t set it to an object

ggplot(aes(aes1,aes2))+ 
    geom_X(aes(aes1,aes2))+ 
    theme()

Scatter Plots

Scatterplots are an excellent first plot to start off with. There are lots of ways to manipulate scatterplots to give very informative figures-which you will see farther down on this page.

The data and further information on making scatterplots can be found here.

First thing first, load the data. What I have written in this chunk may not work for you. You may have to do something along the lines of scatter=read.csv(file.choose()) and then select the scatter.csv from wherever you saved it on your computer.

Its always a good idea to look at the data and make sure it uploaded properly before you start plotting. This also makes sure you know what the column names are.

scatter=read.csv(here("data/scatter.csv"))%>%dplyr::select(-X)
head(scatter)
##       country continent lifeExp      pop  gdpPercap
## 1 Afghanistan      Asia  43.828 31889923   974.5803
## 2     Albania    Europe  76.423  3600523  5937.0295
## 3     Algeria    Africa  72.301 33333216  6223.3675
## 4      Angola    Africa  42.731 12420476  4797.2313
## 5   Argentina  Americas  75.320 40301927 12779.3796
## 6   Australia   Oceania  81.235 20434176 34435.3674

Basic Scatter Pot

ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_()
ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point()

Scatter plot with trendline

Sometimes you want to add a trendline. Since this dataset is not going to give a nice linear trendline as is we are tweaking it a bit by taking the log of the gdpPercap. This is very easy and is just done by adding the log() around our X variable.

for more information on how to get a line of best fit see the documentation for geom_smooth

ggplot(scatter,aes(x=,y=lifeExp))+
  geom_point()+
  geom_(="lm")
ggplot(scatter,aes(x=log(gdpPercap),y=lifeExp))+
  geom_point()+
  geom_smooth(method="lm")

Scatter plot with different aes for the points

When you have a lot of nice metadata associated with the variables you are plotting. It is nice to incorporate these into your figures. You can normally change

  • Shape
  • Colour
  • Fill
  • Alpha
  • Size

Note: We have taken the log() away from the X value.

Let’s change the colour of the points based on contient and scale the points based on pop

ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes())
ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(colour=continent,size=pop))

Lets change the shape of the points

ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(size=pop))
ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(shape=continent,size=pop))

You can see we get a lot of strange shapes. You can specify the shapes you want by using a number code. You can find all those here

Whats the difference between fill and colour? Also why are there different versions of the same shape?

Bar Charts and Violin Plots

What I have written in this chunk may not work for you. You may have to do something along the lines of bar=read.csv(file.choose()) and then select the bar_plot.csv from wherever you saved it on your computer.

The dataset we will be using is looking at how much sleep students in different years of school get (in minutes) at different times in the year.

Bar Chart

More information on making bar charts can be found here

bar=read.csv(here("data/bar_plot.csv"))
head(bar)
##   student  year  time minutes
## 1       1 year4 week2      15
## 2       2 year4 week2      30
## 3       3 year4 week2      17
## 4       4 year4 week2       5
## 5       5 year4 week2       5
## 6       6 year4 week2      14

There are a lot of ways you can display a bar chart. It’s very easy to switch between them.

All of these will have stat="summary" and fun.y="mean" in the geom_bar(), this is how we can make sure we are plotting the means of each category.

Grouped Bar Chart

ggplot(bar, aes(x=,y=,fill=))+geom_bar(position="",stat = "summary", fun.y = "mean")
ggplot(bar, aes(x=time,y=minutes,fill=year))+geom_bar(position="dodge",stat = "summary", fun.y = "mean")

Stacked Bar Chart

Changing the position will allow us to have different types of bar charts. To get it stacked, use position="stack".

ggplot(bar, aes(x=time,y=minutes,fill=year))+geom_bar(position="")
ggplot(bar, aes(x=time,y=minutes,fill=year))+geom_bar(position="stack",stat = "summary", fun.y = "mean")

Percent Bar Chart

ggplot(bar, aes(x=time,y=minutes,fill=year))+geom_bar()
ggplot(bar, aes(x=time,y=minutes,fill=year))+geom_bar(position="fill",stat = "summary", fun.y = "mean")

To summarize:

  • grouped bar chart: position="dodge"
  • stacked bar chart: position="stack"
  • percent bar chart: position="fill"

Box and Whisker Plot

Box and Whisket plots are considered an improvement over the barplot because they give a better idea of the spread of the data. You can see the mean, quartiles and outliers, these are not evident with the bar plot.

More information on making box and whisker plots can be found here

ggplot(bar, aes(x=time,y=minutes,fill=year))+geom_()
ggplot(bar, aes(x=time,y=minutes,fill=year))+geom_boxplot()

Violin Plot

Violin plots are sometimes considered anoter level up from the box and whisker (so to keep track bar<box<violin) since it gives a better (more visual) idea of how the points are distributed.

More information on violin plots can be found here

ggplot(bar, aes(x=,y=,="year"))+
  geom_()
ggplot(bar, aes(x=time,y=minutes,fill=year))+
  geom_violin()

You may have noticed that for all of these, the axis are in the order final, midterm, week2. While not a big deal, it would be nice if they were week2, midterm, final. We are going to get into how to change that later. For now, we will stick to the basic plots.

Density Plots

What I have written in this chunk may not work for you. You may have to do something along the lines of density=read.csv(file.choose()) and then select the density.csv from wherever you saved it on your computer.

More information on density plots can be found here

This is a randomly generated dataset for the weights of males vs. females.

density=read.csv(here("data/denisty.csv"))
head(density)
##   X sex weight
## 1 1   F     49
## 2 2   F     56
## 3 3   F     60
## 4 4   F     43
## 5 5   F     57
## 6 6   F     58
ggplot(density, aes(x=,fill=)) + 
  geom_()
ggplot(density, aes(x=weight,fill=sex)) + 
  geom_density()

There is a large chunk of this figure that is overlapping. If we want to be able to see what is going on, we can change the alpha (or opaque the figures are)

ggplot(density, aes(x=weight,fill=sex)) + 
  geom_density(=0.5)
ggplot(density, aes(x=weight,fill=sex)) + 
  geom_density(alpha=0.5)

Customizing your Plots

This is the same plot we made above, we haven’t added any customizations.

ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(colour=continent,size=pop))+
  xlab(" GDP Per Capita")+
  ylab("Life Expectancy")

Themes

There are a lot of different themes you can use when making your plots. You can see a description of them all here, and I’ll show you some examples below.

ggplot(scatter,aes())+
  geom_point(aes(colour=continent,size=pop))+
  xlab(" GDP Per Capita")+
  ylab("Life Expectancy")+
  theme_()
ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(colour=continent,size=pop))+
  xlab(" GDP Per Capita")+
  ylab("Life Expectancy")+
  theme_bw()

scatter_bw=ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(colour=continent,size=pop))+
  xlab(" GDP Per Capita")+
  ylab("Life Expectancy")+
  ggtitle(("theme_bw"))+
  theme_bw()+
  theme(legend.position = "none")
scatter_light=ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(colour=continent,size=pop))+
  xlab(" GDP Per Capita")+
  ylab("Life Expectancy")+
  ggtitle("theme_classic")+
  theme_classic()+
  theme(legend.position = "none")
scatter_dark=ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(colour=continent,size=pop))+
  xlab(" GDP Per Capita")+
  ylab("Life Expectancy")+
  ggtitle("theme_dark")+
  theme_dark()+
  theme(legend.position = "none")
scatter_minimal=ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(colour=continent,size=pop))+
  xlab(" GDP Per Capita")+
  ylab("Life Expectancy")+
  ggtitle("theme_minimal")+
  theme_minimal()+
  theme(legend.position = "none")

Patchwork

Patchwork is a lovely package that allows you to very simply arrange your plots in whatever manner you like. See the documentation here. You can see an example of how you would use it below.

(scatter_bw|scatter_light)/(scatter_dark|scatter_minimal)

ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(colour=continent,size=pop))+
  xlab(" GDP Per Capita")+
  ylab("Life Expectancy")+
  ggtitle("Removing All Gridlines")+
  theme_bw()+
  theme( = element_blank())
ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(colour=continent,size=pop))+
  xlab(" GDP Per Capita")+
  ylab("Life Expectancy")+
  ggtitle("Removing All Gridlines")+
  theme_bw()+
  theme(panel.grid = element_blank())

ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(colour=continent,size=pop))+
  xlab(" GDP Per Capita")+
  ylab("Life Expectancy")+
  ggtitle("Removing All Vertical Lines")+
  theme_bw()+
  theme(= element_blank(),
       = element_blank())
ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(colour=continent,size=pop))+
  xlab(" GDP Per Capita")+
  ylab("Life Expectancy")+
  ggtitle("Removing All Vertical Lines")+
  theme_bw()+
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank())

### Modifying the legend

There are a lot of ways you can change the legend. You will probably need to do some research for your specific use case. There are options in theme() and options in guide() and a lot of others.

This is normally where I go when I first start troubleshooting my legend problems.

ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(colour=continent,size=pop))+
  xlab(" GDP Per Capita")+
  ylab("Life Expectancy")+
  ("Renaming the Legends")+
  guides(colour=guide_legend(),size=guide_legend(title="Population"))+
  theme_bw()+
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank())
ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(colour=continent,size=pop))+
  xlab(" GDP Per Capita")+
  ylab("Life Expectancy")+
  ggtitle("Renaming the Legends")+
  guides(colour=guide_legend(title="Continent"),size=guide_legend(title="Population"))+
  theme_bw()+
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank())

Colouring

Everybody’s favourite part of plotting. There is A LOT of different ways you can colour your plots. So I advise you explore this when the time comes, but there are a lot of pre-set colour schemes that look great and are easy to implement. You can find resources on them here

ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(colour=continent,size=pop))+
  scale_colour_viridis_()+
  ggtitle("Scatterplot with Viridis Colouring")+
  xlab(" GDP Per Capita")+
  ylab("Life Expectancy")+
  guides(colour=guide_legend(title="Continent"),size=(title="Population"))+
  theme_()+
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank())
ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(colour=continent,size=pop))+
  scale_colour_viridis_d()+
  ggtitle("Scatterplot with Viridis Colouring")+
  xlab(" GDP Per Capita")+
  ylab("Life Expectancy")+
  guides(colour=guide_legend(title="Continent"),size=guide_legend(title="Population"))+
  theme_bw()+
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank())

ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(colour=continent,size=pop))+
  scale_colour_brewer()+
  ggtitle("Scatterplot with Colour Brewer")+
  xlab(" GDP Per Capita")+
 ("Life Expectancy")+
  guides(colour=guide_legend(title="Continent"),size=guide_legend(title="Population"))+
  theme_bw()+
  (panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank())
ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(colour=continent,size=pop))+
  scale_colour_brewer(palette = "Paired")+
  ggtitle("Scatterplot with Colour Brewer")+
  xlab(" GDP Per Capita")+
  ylab("Life Expectancy")+
  guides(colour=guide_legend(title="Continent"),size=guide_legend(title="Population"))+
  theme_bw()+
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank())

If you’re going to get into manual colouring look here to figure out what kind of colours are available.

ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(colour=continent,size=pop))+
  scale__manual(=c("orangered3","slateblue","lightseagreen","orchid3","sienna2"))+
  ggtitle("Scatterplot with Manual Colours")+
  xlab(" GDP Per Capita")+
  ylab("Life Expectancy")+
  guides(=guide_legend(title="Continent"),size=guide_legend(title="Population"))+
  theme_bw()+
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank())
ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(colour=continent,size=pop))+
  scale_colour_manual(values=c("orangered3","slateblue","lightseagreen","orchid3","sienna2"))+
  ggtitle("Scatterplot with Manual Colours")+
  xlab(" GDP Per Capita")+
  ylab("Life Expectancy")+
  guides(colour=guide_legend(title="Continent"),size=guide_legend(title="Population"))+
  theme_bw()+
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank())

If you are hoping to apply a specific colour to a specific category you can do it like this.

ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(colour=continent,size=pop))+
  scale_colour_manual(values=c("Americas"="orangered3","Asia"="slateblue","Africa"="lightseagreen","Europe"="orchid3","Oceania"="sienna2"))+
  ggtitle("Scatterplot with Manual Colours-Assigned to Continent")+
  xlab(" GDP Per Capita")+
  ylab("Life Expectancy")+
  guides(colour=guide_legend(title="Continent"),size=guide_legend(title="Population"))+
  theme_bw()+
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank())

Facets

Sometimes when you have a lot of data it can be useful to facet your plots. This is really easy! As you can see below. The different options for facets can be seen here

ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(colour=continent,size=pop))+
  scale_colour_viridis_d()+
  xlab(" GDP Per Capita")+
  ylab("Life Expectancy")+
  guides(colour=guide_legend(title="Continent"),size=guide_legend(title="Population"))+
  theme_bw()+
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank())+
  facet_grid()
ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(colour=continent,size=pop))+
  scale_colour_viridis_d()+
  xlab(" GDP Per Capita")+
  ylab("Life Expectancy")+
  guides(colour=guide_legend(title="Continent"),size=guide_legend(title="Population"))+
  theme_bw()+
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank())+
  facet_grid(~continent)

You can see that the facet made our x-axis labels difficult to see. Luckily this is one of the many elements we can fix in theme()

ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(colour=continent,size=pop))+
  scale_colour_viridis_d()+
  xlab(" GDP Per Capita")+
  ylab("Life Expectancy")+
  guides(colour=guide_legend(title="Continent"),size=guide_legend(title="Population"))+
  theme_bw()+
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank(),
       = element_text(angle=45,vjust=0.5))+
  facet_grid(~continent)
ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(colour=continent,size=pop))+
  scale_colour_viridis_d()+
  xlab(" GDP Per Capita")+
  ylab("Life Expectancy")+
  guides(colour=guide_legend(title="Continent"),size=guide_legend(title="Population"))+
  theme_bw()+
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank(),
        axis.text.x = element_text(angle=45,vjust=0.5))+
  facet_grid(~continent)

Changing Factor Levels

Remember up above we saw that the categories for the sleep dataset weren’t in the right order. We are going to reset the factor levels and then it should plot properly.

bar_ofl=ggplot(bar, aes(x=time,y=minutes,fill=year))+geom_bar(position="dodge",stat="identity")
bar$time <- factor(bar$time,levels = c())
bar_rfl=ggplot(bar, aes(x=time,y=minutes,fill=year))+geom_bar(position="dodge",stat="")
(|)
bar_ofl=ggplot(bar, aes(x=time,y=minutes,fill=year))+geom_bar(position="dodge",stat="identity")
bar$time <- factor(bar$time,levels = c("week2","midterm","final"))
bar_rfl=ggplot(bar, aes(x=time,y=minutes,fill=year))+geom_bar(position="dodge",stat="identity")
(bar_ofl|bar_rfl)

Away from the Basics

We went through some very basics code today. But one of the things that makes ggplot so popular is how customizable it is. The following are 2 examples of plots that I have been using a lot in my research and work that may look very complicated but are really only building on the very basic structures we have gone over today.

Phytoplankton Gene Isoform Counts Based on Cell Diameter and Taxa

This was done with Katherine Fleury for her undergraduate honours project at Mount Allison University.

merged_data_na=read.csv(here("data/my_example.csv"))
merged_plot <- ggplot(merged_data_na,aes(x=Enzyme_Isoform,y=(Diameter),size= isoform_count,shape=ome, colour= taxa))+
  geom_point()+
  xlab("Gene Family")+
  ylab("Diameter")+
  scale_x_discrete(position ="top")+
  scale_shape_manual(values = c(16, 18),"Data Type")+
  scale_color_brewer(palette = "Paired")+
  guides(size=guide_legend(title="Isoform Count"),
         shape=guide_legend(override.aes = list(size=3)), 
         colour=guide_legend(override.aes = list(size=3),(title = "Taxa")))+
  theme_bw()+
  theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        axis.text.y=element_text(face="italic",size=12),
        axis.text.x=element_text(size=12,angle=90),
        axis.text=element_text(size=12),
        axis.title=element_text(size=14,face="bold"))+
  facet_grid(~Enzyme_parent_ASA, scales = "free", space = "free")
merged_plot
## Warning: Removed 477 rows containing missing values (geom_point).

Network plot

You need a few other packages in order to get the data ready for this. You can see it in the Rmd version of this html (you can find it in the ReadMe).

ggplot(network_df,aes(x = x, y = y, xend = xend, yend = yend))+
  geom_edges(arrow = arrow(length = unit(6, "pt")))+
  geom_nodes(aes(colour=status),size=8)+
  geom_text(aes(label=name), check_overlap = TRUE)+
  scale_color_brewer(palette = "Accent")+
  ggtitle("Disease Transmission")+
  theme_blank()

Trouble shooting